{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "# Visualizing what convnets learn\n",
    "\n",
    "**Author:** [fchollet](https://twitter.com/fchollet)<br>\n",
    "**Date created:** 2020/05/29<br>\n",
    "**Last modified:** 2020/05/29<br>\n",
    "**Description:** Displaying the visual patterns that convnet filters respond to."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Introduction\n",
    "\n",
    "In this example, we look into what sort of visual patterns image classification models\n",
    "learn. We'll be using the `ResNet50V2` model, trained on the ImageNet dataset.\n",
    "\n",
    "Our process is simple: we will create input images that maximize the activation of\n",
    "specific filters in a target layer (picked somewhere in the middle of the model: layer\n",
    "`conv3_block4_out`). Such images represent a visualization of the\n",
    "pattern that the filter responds to.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Setup\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "from tensorflow import keras\n",
    "\n",
    "# The dimensions of our input image\n",
    "img_width = 180\n",
    "img_height = 180\n",
    "# Our target layer: we will visualize the filters from this layer.\n",
    "# See `model.summary()` for list of layer names, if you want to change this.\n",
    "layer_name = \"conv3_block4_out\"\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Build a feature extraction model\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "# Build a ResNet50V2 model loaded with pre-trained ImageNet weights\n",
    "model = keras.applications.ResNet50V2(weights=\"imagenet\", include_top=False)\n",
    "\n",
    "# Set up a model that returns the activation values for our target layer\n",
    "layer = model.get_layer(name=layer_name)\n",
    "feature_extractor = keras.Model(inputs=model.inputs, outputs=layer.output)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Set up the gradient ascent process\n",
    "\n",
    "The \"loss\" we will maximize is simply the mean of the activation of a specific filter in\n",
    "our target layer. To avoid border effects, we exclude border pixels.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "\n",
    "def compute_loss(input_image, filter_index):\n",
    "    activation = feature_extractor(input_image)\n",
    "    # We avoid border artifacts by only involving non-border pixels in the loss.\n",
    "    filter_activation = activation[:, 2:-2, 2:-2, filter_index]\n",
    "    return tf.reduce_mean(filter_activation)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Our gradient ascent function simply computes the gradients of the loss above\n",
    "with regard to the input image, and update the update image so as to move it\n",
    "towards a state that will activate the target filter more strongly.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "\n",
    "@tf.function\n",
    "def gradient_ascent_step(img, filter_index, learning_rate):\n",
    "    with tf.GradientTape() as tape:\n",
    "        tape.watch(img)\n",
    "        loss = compute_loss(img, filter_index)\n",
    "    # Compute gradients.\n",
    "    grads = tape.gradient(loss, img)\n",
    "    # Normalize gradients.\n",
    "    grads = tf.math.l2_normalize(grads)\n",
    "    img += learning_rate * grads\n",
    "    return loss, img\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Set up the end-to-end filter visualization loop\n",
    "\n",
    "Our process is as follow:\n",
    "\n",
    "- Start from a random image that is close to \"all gray\" (i.e. visually netural)\n",
    "- Repeatedly apply the gradient ascent step function defined above\n",
    "- Convert the resulting input image back to a displayable form, by normalizing it,\n",
    "center-cropping it, and restricting it to the [0, 255] range.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "\n",
    "def initialize_image():\n",
    "    # We start from a gray image with some random noise\n",
    "    img = tf.random.uniform((1, img_width, img_height, 3))\n",
    "    # ResNet50V2 expects inputs in the range [-1, +1].\n",
    "    # Here we scale our random inputs to [-0.125, +0.125]\n",
    "    return (img - 0.5) * 0.25\n",
    "\n",
    "\n",
    "def visualize_filter(filter_index):\n",
    "    # We run gradient ascent for 20 steps\n",
    "    iterations = 30\n",
    "    learning_rate = 10.0\n",
    "    img = initialize_image()\n",
    "    for iteration in range(iterations):\n",
    "        loss, img = gradient_ascent_step(img, filter_index, learning_rate)\n",
    "\n",
    "    # Decode the resulting input image\n",
    "    img = deprocess_image(img[0].numpy())\n",
    "    return loss, img\n",
    "\n",
    "\n",
    "def deprocess_image(img):\n",
    "    # Normalize array: center on 0., ensure variance is 0.15\n",
    "    img -= img.mean()\n",
    "    img /= img.std() + 1e-5\n",
    "    img *= 0.15\n",
    "\n",
    "    # Center crop\n",
    "    img = img[25:-25, 25:-25, :]\n",
    "\n",
    "    # Clip to [0, 1]\n",
    "    img += 0.5\n",
    "    img = np.clip(img, 0, 1)\n",
    "\n",
    "    # Convert to RGB array\n",
    "    img *= 255\n",
    "    img = np.clip(img, 0, 255).astype(\"uint8\")\n",
    "    return img\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Let's try it out with filter 0 in the target layer:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "from IPython.display import Image, display\n",
    "\n",
    "loss, img = visualize_filter(0)\n",
    "keras.preprocessing.image.save_img(\"0.png\", img)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "This is what an input that maximizes the response of filter 0 in the target layer would\n",
    "look like:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "display(Image(\"0.png\"))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Visualize the first 64 filters in the target layer\n",
    "\n",
    "Now, let's make a 8x8 grid of the first 64 filters\n",
    "in the target layer to get of feel for the range\n",
    "of different visual patterns that the model has learned.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "# Compute image inputs that maximize per-filter activations\n",
    "# for the first 64 filters of our target layer\n",
    "all_imgs = []\n",
    "for filter_index in range(64):\n",
    "    print(\"Processing filter %d\" % (filter_index,))\n",
    "    loss, img = visualize_filter(filter_index)\n",
    "    all_imgs.append(img)\n",
    "\n",
    "# Build a black picture with enough space for\n",
    "# our 8 x 8 filters of size 128 x 128, with a 5px margin in between\n",
    "margin = 5\n",
    "n = 8\n",
    "cropped_width = img_width - 25 * 2\n",
    "cropped_height = img_height - 25 * 2\n",
    "width = n * cropped_width + (n - 1) * margin\n",
    "height = n * cropped_height + (n - 1) * margin\n",
    "stitched_filters = np.zeros((width, height, 3))\n",
    "\n",
    "# Fill the picture with our saved filters\n",
    "for i in range(n):\n",
    "    for j in range(n):\n",
    "        img = all_imgs[i * n + j]\n",
    "        stitched_filters[\n",
    "            (cropped_width + margin) * i : (cropped_width + margin) * i + cropped_width,\n",
    "            (cropped_height + margin) * j : (cropped_height + margin) * j\n",
    "            + cropped_height,\n",
    "            :,\n",
    "        ] = img\n",
    "keras.preprocessing.image.save_img(\"stiched_filters.png\", stitched_filters)\n",
    "\n",
    "from IPython.display import Image, display\n",
    "\n",
    "display(Image(\"stiched_filters.png\"))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Image classification models see the world by decomposing their inputs over a \"vector\n",
    "basis\" of texture filters such as these.\n",
    "\n",
    "See also\n",
    "[this old blog post](https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html)\n",
    "for analysis and interpretation.\n"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "visualizing_what_convnets_learn",
   "private_outputs": false,
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}